mlatoz

Thompson Sampling

Ad i gets rewards y from Bernoulli distribution *p(**y**|θ_i) ~ ß(θ_i)*
θ_i is unknown but we set its uncertainty by assuming it has a uniform distribution *p(θ_i)* ~ *u*([0, 1]), which is the prior distribution.
Bayes Rule: We approach θ_i by the posterior distribution
We get *p(θ_i|**y**) ~ ß(number of successes + 1, number of failures + 1)*
At each round n we take a random draw θ_i(n) from this posterior distribution p(θ<sub>i</sub>|**y**), for each ad i.
At each round n we select the ad i that has the highest θ_i(n).

Step 1: At each round n, we consider two numbers for each ad i:

Step 2: For each ad i, we take a random draw from the distribution below:

θ_i(n) = ß(N¹_i(n) + 1, N⁰_i(n) + 1)

Step 3: We select the ad that has the highest θ_i(n).

Upper Confidence Bound	Thompson Sampling

Deterministic	Probabilistic
Requires update at every round	Can accommodate delayed feedback
Less empirical evidence than Thompson Sampling	Better empirical evidence than UCB